智能论文笔记

Dynamically Modular and Sparse General Continual Learning

Arnav Varma , Elahe Arani , Bahram Zonooz

分类：计算机视觉 | 人工智能 | 机器学习 | 神经与进化计算

2023-01-02

Real-world applications often require learning continuously from a stream of data under ever-changing conditions. When trying to learn from such non-stationary data, deep neural networks (DNNs) undergo catastrophic forgetting of previously learned information. Among the common approaches to avoid catastrophic forgetting, rehearsal-based methods have proven effective. However, they are still prone to forgetting due to task-interference as all parameters respond to all tasks. To counter this, we take inspiration from sparse coding in the brain and introduce dynamic modularity and sparsity (Dynamos) for rehearsal-based general continual learning. In this setup, the DNN learns to respond to stimuli by activating relevant subsets of neurons. We demonstrate the effectiveness of Dynamos on multiple datasets under challenging continual learning evaluation protocols. Finally, we show that our method learns representations that are modular and specialized, while maintaining reusability by activating subsets of neurons with overlaps corresponding to the similarity of stimuli.

translated by 谷歌翻译

Adversarial Attacks on Monocular Pose Estimation

Hemang Chawla , Arnav Varma , Elahe Arani , Bahram Zonooz

分类：计算机视觉 | 人工智能

2022-07-14

深度学习的进步已导致计算机视觉的稳定进步，并提高了对象检测和语义细分等任务的准确性。然而，深度神经网络容易受到对抗攻击的影响，因此在可靠的部署中提出了挑战。 3D场景对机器人技术和高级驱动辅助系统的理解中的两个突出任务是单眼的深度和姿势估计，通常以无监督的方式一起学习。尽管存在评估对抗性攻击对单眼深度估计的影响的研究，但缺乏对对抗性扰动对姿势估计的系统性证明和分析。我们展示了加性不可感知的扰动不仅可以改变预测以增加轨迹漂移，还可以改变其几何形状。我们还研究了针对单眼深度和姿势估计网络的对抗性扰动之间的关系，以及将扰动转移到具有不同架构和损失的其他网络之间的关系。我们的实验表明，生成的扰动如何导致相对旋转和翻译预测的显着错误以及阐明网络的漏洞。

translated by 谷歌翻译

Towards Proactively Forecasting Sentence-Specific Information Popularity within Online News Documents

Sayar Ghosh Roy , Anshul Padhi , Risubh Jain , Manish Gupta , Vasudeva Varma

分类：自然语言处理 | 人工智能 | 机器学习

2022-12-31

Multiple studies have focused on predicting the prospective popularity of an online document as a whole, without paying attention to the contributions of its individual parts. We introduce the task of proactively forecasting popularities of sentences within online news documents solely utilizing their natural language content. We model sentence-specific popularity forecasting as a sequence regression task. For training our models, we curate InfoPop, the first dataset containing popularity labels for over 1.7 million sentences from over 50,000 online news documents. To the best of our knowledge, this is the first dataset automatically created using streams of incoming search engine queries to generate sentence-level popularity annotations. We propose a novel transfer learning approach involving sentence salience prediction as an auxiliary task. Our proposed technique coupled with a BERT-based neural model exceeds nDCG values of 0.8 for proactive sentence-specific popularity forecasting. Notably, our study presents a non-trivial takeaway: though popularity and salience are different concepts, transfer learning from salience prediction enhances popularity forecasting. We release InfoPop and make our code publicly available: https://github.com/sayarghoshroy/InfoPopularity

translated by 谷歌翻译

Naamapadam: A Large-Scale Named Entity Annotated Data for Indic Languages

Arnav Mhaske , Harshit Kedia , Sumanth Doddapaneni , Mitesh M. Khapra , Pratyush Kumar , Rudra Murthy V , Anoop Kunchukuttan

分类：自然语言处理

2022-12-20

We present, Naamapadam, the largest publicly available Named Entity Recognition (NER) dataset for the 11 major Indian languages from two language families. In each language, it contains more than 400k sentences annotated with a total of at least 100k entities from three standard entity categories (Person, Location and Organization) for 9 out of the 11 languages. The training dataset has been automatically created from the Samanantar parallel corpus by projecting automatically tagged entities from an English sentence to the corresponding Indian language sentence. We also create manually annotated testsets for 8 languages containing approximately 1000 sentences per language. We demonstrate the utility of the obtained dataset on existing testsets and the Naamapadam-test data for 8 Indic languages. We also release IndicNER, a multilingual mBERT model fine-tuned on the Naamapadam training set. IndicNER achieves the best F1 on the Naamapadam-test set compared to an mBERT model fine-tuned on existing datasets. IndicNER achieves an F1 score of more than 80 for 7 out of 11 Indic languages. The dataset and models are available under open-source licenses at https://ai4bharat.iitm.ac.in/naamapadam.

translated by 谷歌翻译

Hybrid Model using Feature Extraction and Non-linear SVM for Brain Tumor Classification

Lalita Mishra , Shekhar Verma , Shirshu Varma

分类：计算机视觉

2022-12-06

It is essential to classify brain tumors from magnetic resonance imaging (MRI) accurately for better and timely treatment of the patients. In this paper, we propose a hybrid model, using VGG along with Nonlinear-SVM (Soft and Hard) to classify the brain tumors: glioma and pituitary and tumorous and non-tumorous. The VGG-SVM model is trained for two different datasets of two classes; thus, we perform binary classification. The VGG models are trained via the PyTorch python library to obtain the highest testing accuracy of tumor classification. The method is threefold, in the first step, we normalize and resize the images, and the second step consists of feature extraction through variants of the VGG model. The third step classified brain tumors using non-linear SVM (soft and hard). We have obtained 98.18% accuracy for the first dataset and 99.78% for the second dataset using VGG19. The classification accuracies for non-linear SVM are 95.50% and 97.98% with linear and rbf kernel and 97.95% for soft SVM with RBF kernel with D1, and 96.75% and 98.60% with linear and RBF kernel and 98.38% for soft SVM with RBF kernel with D2. Results indicate that the hybrid VGG-SVM model, especially VGG 19 with SVM, is able to outperform existing techniques and achieve high accuracy.

translated by 谷歌翻译

Towards Simple and Efficient Task-Adaptive Pre-training for Text Classification

Arnav Ladkat , Aamir Miyajiwala , Samiksha Jagadale , Rekha Kulkarni , Raviraj Joshi

分类：自然语言处理 | 机器学习

2022-09-26

语言模型是使用大量通用数据（如Book Copus，Common Crawl和Wikipedia）进行预训练的，这对于模型了解语言的语言特征至关重要。新的研究建议将域自适应预训练（DAPT）和任务自适应预训练（TAPT）作为最终填充任务之前的中间步骤。此步骤有助于涵盖目标域词汇，并改善下游任务的模型性能。在这项工作中，我们仅研究训练在TAPT和特定于任务的填充过程中嵌入层对模型性能的影响。基于我们的研究，我们提出了一种简单的方法，以通过对BERT层进行选择性预训练，使基于BERT的模型的中间步骤更有效。我们表明，在TAPT期间仅训练BERT嵌入层足以适应目标域的词汇并实现可比的性能。我们的方法在计算上是有效的，在TAPT期间训练了78％的参数。所提出的嵌入层列式方法也可以是一种有效的域适应技术。

translated by 谷歌翻译

XF2T: Cross-lingual Fact-to-Text Generation for Low-Resource Languages

Shivprasad Sagare , Tushar Abhishek , Bhavyajeet Singh , Anubhav Sharma , Manish Gupta , Vasudeva Varma

分类：自然语言处理

2022-09-22

多种业务场景需要从结构化输入数据中自动生成描述性的人类可读文本。因此，已经开发了针对各种下游任务的事实到文本的系统主要是由于相关数据集的高可用性。直到最近，提出了跨语言事实与文本（XF2T）的问题，该问题是针对多种语言的生成，以及一个数据集，Xalign的八种语言。但是，实际上XF2T生成问题没有严格的工作。我们使用另外四种语言的注释数据扩展了Xalign数据集：旁遮普语，马拉雅拉姆语，阿萨姆语和Oriya。我们在扩展的多语言数据集上使用基于变压器的流行文本生成模型进行了广泛的研究，我们称之为Xalignv2。此外，我们研究了不同文本生成策略的性能：预处理，事实感知的嵌入和结构意识的输入编码的多种变化。我们的广泛实验表明，使用具有结构意识的输入编码的事实感知的嵌入式的多语言MT5模型可以平均在十二种语言中获得最佳结果。我们将代码，数据集和模型公开可用，并希望这将有助于进一步在此关键领域进行进一步的研究。

translated by 谷歌翻译

Incorporating Customer Reviews in Size and Fit Recommendation systems for Fashion E-Commerce

Oishik Chatterjee , Jaidam Ram Tej , Narendra Varma Dasaraju

分类：机器学习

2022-08-11

随着电子商务领域的巨大增长，产品建议已成为电子商务公司越来越多的兴趣领域。产品建议中最困难的任务之一是尺寸和合适的预测。电子时尚域中有很多相关的回报和退款，这给客户带来了不便，并给公司带来了损失。因此，拥有良好的尺寸和合适的推荐系统，可以预测客户的正确尺寸，不仅可以减少相关的回报和退款，还可以改善客户体验。该领域的早期作品使用传统的机器学习方法来估计购买历史记录的客户和产品尺寸。由于客户产品数据中的巨大稀疏，这些方法遭受了冷启动问题。最近，人们使用深度学习来通过嵌入客户和产品功能来解决此问题。但是，它们都没有包含在产品页面上存在的有价值的客户反馈以及客户和产品功能。我们提出了一种新颖的方法，该方法可以使用客户评论中的信息以及客户和产品功能来实现尺寸和合适的预测。与在4个数据集上使用产品和客户功能相比，我们证明了方法的有效性。我们的方法显示，在4个不同数据集的基线上，F1（宏）得分的提高了1.37％-4.31％。

translated by 谷歌翻译

Emergent social NPC interactions in the Social NPCs Skyrim mod and beyond

Manuel Guimarães , Pedro A. Santos , Arnav Jhala

分类：人工智能

2022-07-27

这项工作介绍了一种在开放世界游戏中为非演奏世界运动而创作非玩家角色（NPC）的社会建筑模型的实施，该游戏受到基于代理建模的学术研究的启发。就丰富的对话和响应行为而言，可信的NPC创作是繁重的。我们简要介绍了为此任务使用社会代理体系结构的特征和优势，并描述了社会代理体系结构CIF-CK作为Mod Social NPC的实现

translated by 谷歌翻译

Is Attention All NeRF Needs?

Mukund Varma T , Peihao Wang , Xuxi Chen , Tianlong Chen , Subhashini Venugopalan , Zhangyang Wang

分类：计算机视觉

2022-07-27

我们提出了可推广的NERF变压器（GNT），这是一种纯粹的，统一的基于变压器的体系结构，可以从源视图中有效地重建神经辐射场（NERF）。与NERF上的先前作品不同，通过颠倒手工渲染方程来优化人均隐式表示，GNT通过封装两个基于变压器的阶段来实现可概括的神经场景表示和渲染。 GNT的第一阶段，称为View Transformer，利用多视图几何形状作为基于注意力的场景表示的电感偏差，并通过在相邻视图上从异性线中汇总信息来预测与坐标对齐的特征。 GNT的第二阶段，名为Ray Transformer，通过Ray Marching呈现新视图，并使用注意机制直接解码采样点特征的序列。我们的实验表明，当在单个场景上进行优化时，GNT可以在不明确渲染公式的情况下成功重建NERF，甚至由于可学习的射线渲染器，在复杂的场景上甚至将PSNR提高了〜1.3db。当在各种场景中接受培训时，GNT转移到前面的LLFF数据集（LPIPS〜20％，SSIM〜25％$）和合成搅拌器数据集（LPIPS〜20％，SSIM 〜25％$）时，GNN会始终达到最先进的性能4％）。此外，我们表明可以从学习的注意图中推断出深度和遮挡，这意味着纯粹的注意机制能够学习一个物理地面渲染过程。所有这些结果使我们更接近将变形金刚作为“通用建模工具”甚至用于图形的诱人希望。请参阅我们的项目页面以获取视频结果：https：//vita-group.github.io/gnt/。

translated by 谷歌翻译